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Abstract: Suppose that we observe independent random pairs (Xi,Yi), 
(X2,Y2), . . . , (X n ,Y n ). Our goal is to estimate regression functions such 
as the conditional mean or /3— quantilc of Y given X, where < /3 < 1. In 
order to achieve this we minimize criteria such as, for instance, 

n 

^(/(X;)- Y,) + A-TV(/) 
i=i 

among all candidate functions /. Here p is some convex function depending 
on the particular regression function we have in mind, TV(/) stands for 
the total variation of /, and A > is some tuning parameter. This frame- 
work is extended further to include binary or Poisson regression, and to in- 
clude localized total variation penalties. The latter are needed to construct 
estimators adapting to inhomogeneous smoothness of /. For the general 
framework we develop noniterative algorithms for the solution of the min- 
imization problems which arc closely related to the taut string algorithm 
(cf. Davies and Kovac, 2001). Further we establish a connection between 
the present setting and monotone regression, extending previous work by 
Mammen and van de Geer (1997). The algorithmic considerations and nu- 
merical examples are complemented by two consistency results. 
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1. Introduction 

Suppose that we observe pairs (xi, Yi), {x%, Y2), . ■ . , (x n , Y n ) with fixed numbers 
%i _• X2 < ••• < x n and independent random variables Yi, Y%, . . . , Y n . We 
assume that the distribution function of Yi depends on Xj, i.e. 

P(Y t < z) = F(z|xi) 

for some unknown family of distribution functions F(- | x), id. Often one is 
interested in certain features of these distribution functions. Examples are the 
mean function p with 

K x ) : = / y F {dy\x) 



and, for some (3 £ (0, 1), the /3-quantile function Q^, where Qp{x) is any number 
z such that 

F(z- \ x) < 13 < F(z\x). 

This paper treats estimation of such regression functions utilizing certain 
roughness penalties. The literature about penalized regression estimators is vast 
and still growing. As a good starting point we recommend Antoniadis and Fan 
(2001), van de Geer (2001), Huang (2003), and the references therein. A first 
possibility is to minimize a functional of the form 

n 

T(f) := 5>(/(x,) - Y ( ) + A • TV(/) (1) 

i=l 

over all functions / on the real line. Here p is some convex function measuring 
the size of the residual Y — f{xi) and depending on the particular feature we 
have in mind. Moreover, TV(/) denotes the total variation of /, that is the 
supremum of X^j=i l/( 2 i+i) ~ fi z j)\ over au integers m > 1 and numbers 
Z\ < Z2 < ■ ■ ■ < z m , while A > is some tuning parameter. 

Example I (means). In order to estimate the mean function p, one can take 

p(z) := z 2 /2. 

This particular case has been treated in detail by Mammen and van de Geer 
(1997) and Davies and Kovac (2001); see also the remark following Lemma 2.2. 
In particular, the latter authors describe an algorithm with running time 0{n), 
the taut string method, to minimize the functional T above. 

Example II (quantiles). For the estimation of a quantile function Qp one 
can take 



p{z)=pp{z) := \z\/2-((3-l/2)z 



(1 - /3)z if z > 0, 
0\z\ ifz<0. 
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Of particular interest is the case f3 = 1/2. Then p(z) = \z\/2, and Q1/2 is the 
conditional median function. This particular functional has also been suggested 
by Simpson, He and Liu in their discussion of Chu et al. (1998) but, as far as we 
know, not been considered in more detail later on. However, similar functionals 
using a discretisation of the total variation of the first derivative as a penalty 
have been studied by Kocnker, Ng and Portnoy (1994) or in two dimensions by 
Kocnkcr and Mizera (2004). They employ linear programming techniques like 
interior point methods to find solutions to the resulting minimisation problems. 

A primary goal of the present paper is to extend the classical taut string algo- 
rithm to other situations such as Example II, or binary and Poisson regression. 
Compared to the linear programming techniques mentioned above, the gener- 
alized taut string method has the advantage of being computationally faster 
and more stable. In the specific case of Example II it is possible to calculate a 
solution in time 0(nlog(n)). Note that the original algorithm yields piecewise 
constant functions. On each constant interval the function value is equal to the 
mean of the corresponding observations, except for local extrema of the fit. In 
their discussion of Davics and Kovac (2001), Mammen and van de Geer men- 
tion the possibility to replace sample means just by sample quantiles, in order 
to treat Example II. However, the present authors realized that the extension 
is not that straightforward. 

The remainder of this paper is organized as follows: In Section 2 we describe 
an extension of the function T above such that it covers also other models such as 
binary and Poisson regression. In addition we replace the penalty term A-TV(J) 
by a more flexible roughness measure which allows local adaptation to varying 
smoothness of the underlying regression function. Then we derive necessary and 
sufficient conditions for a function / to minimize our functional. In that context 
we also establish a connection to monotone regression which is useful for un- 
derstanding adaptivity properties of the procedure. This generalizes findings of 
Mammen and van de Geer (1997) for the least squares case. In Sections 3 and 4 
we derive generalized taut string algorithms, extending the algorithm described 
by Davies and Kovac (2001). While Section 3 covers continuously diffcrcntiablc 
functions p, Section 4 is for general p and, in particular, Example II. Section 5 
explains how our tuning parameters, e.g. A in (1), may be chosen. Section 6 
presents some numerical examples of our methods. In Section 7 we comple- 
ment the algorithmic considerations with two consistency results which are of 
independent interest. One of them entails uniform consistency of monotone re- 
gression estimators, while the other applies to arbitrary estimators such that 
the corresponding residuals satisfy a certain multiscale criterion. Both results 
are a first step towards a detailed asymptotic analysis of taut string and related 
methods. All longer proofs are deferred to Section 8. 

2. The general setting 

For simplicity we assume throughout this paper that x\ < X2 < ■ ■ ■ < x n . In 
Section 3.3 we describe briefly possible modifications to deal with potential ties 
among the Xi. 
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2. 1 . The target functional 

We often identify a function / : R — > R with the vector / = (fi)f =1 := 
(/(xi))™ =1 . Our aim is to minimize functionals of the form 

n n—1 

T(f)=T x (f) := J2Mfi) + Y, X M+i-h\ 

i=l j = l 

over all vectors / € R n , where A £ (0, oo)™ -1 is a given vector of tuning param- 
eters while the Ri are random functions depending on the data. In general we 
assume the following two conditions to be satisfied: 

(A.l) For each index i, the function Ri : R — ► M is convex. 
(A. 2) T is coercive, i.e. T(f) ->ooas ||/|| — > oo. 

Condition (A.l) entails that T is a continuous and convex functional on R™, so 
the additional Condition (A. 2) guarantees the existence of a minimizer / of T. 
This will be our estimator for the regression function of interest, evaluated at 
the design points Xi. 

The special functional T in (1) corresponds to Ai = • ■ • = A n _i = A and 
Ri(z) := p(z — Yi). Here our Conditions (A. 1-2) are satisfied if p(z) — > oo as 
\z\ — > oo. Two additional examples for Ri follow. 

Example III (Poisson regression). Suppose that Yi G {0,1,2,...} has a 
Poisson distribution with mean fi(xi) > 0, and let 

Ri(z) := exp(z) - zYi. 

These functions are strictly convex with Ri > Yi\og(e/Yi). Thus T is even 
strictly convex, and elementary considerations reveal that it is coercive if Yi > 
for at least one index i. In that case we end up with a unique penalized maximum 
likelihood estimator / of log p. 

Example IV (Binary regression). Similarly let Yi E {0,1} with mean 
u(xi) € (0, 1), and define 

Ri(z) := -YiZ + log(l + exp(z)). 

Here Ri > 0, and again T is strictly convex. It is coercive if the Yi are not all 
identical, and the minimizer / of T may be viewed as a penalized maximum 
likelihood estimator of logit(/i) := log(/i/(l — fx)). 

2.2. Characterizations of the solution 

As mentioned before, Conditions (A. 1-2) guarantee the existence of a minimizer 
/ of T. In the present subsection we derive various characterizations of such 
minimizers, assuming only Condition (A.l). 
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By convexity of T, a vector / € W 1 minimizes T if, and only if, all directional 
derivatives at / are non-negative, i.e. 

DT{f,S) := Urn T ^ + £ ^ - T ^ > for any <5 G R™. (2) 

More specifically, let R\{z±) be the left- and right-sided derivatives of Ri at z, 

R> i{z ±) := **)-*(«). 

Then DT(f, S) equals 

i:8i>0 i:8i<0 
n-1 

+ J2 X i ( si g n (£+i " hm+i - Si) + l{f j+ x = fj}\S j+1 - <5,|) , 

3=1 

where 1{. . . } denotes the indicator function. 

Plugging in various special vectors S reveals valuable information about min- 
imizers /. In particular, for indices 1 < j < k < n let 

5°'*° := fl{j < i < fc}" 
Then 

fc 

DT(f, +6^) = J2 Ri(fi +) - A j _ 1 sign(/ :7 _ 1 - /,) - A fc sign(/ fc+1 - / fc ), 
fc 

DT(f, -5« k) ) = - J2 fittfi -) + A^Sig^/^! - fa + A fc iiixT(/ fc+1 - / fc ). 
Here 

sign(z) := l{z > 0} - l{z < 0} and sign(z) := l{z > 0} - l{z < 0}, 

and throughout this paper we set i>o := v m+1 := for any vector v = (Wj)^ € 
R m . In particular, Ao := A„ := 0. Consequently, applying (2) to ±5^ k ' yields 
the key inequalities 

k 

53 R i(h +) - Aj_isign(/j_i - fj) + A fc sign(/ fc+ i - / fc ) and 

7 (3) 
^Cft -) < A.-iSign^.! - /,) + AfcSign(/ fe+1 - f k ). 

These considerations yield already one part of the following result. 

Lemma 2.1 A vector f 6 W 1 minimizes T if, and only if, (3) holds for all 
l<j<k<n. 
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In case of diffcrcntiablc functions Ri there is a simpler characterization of a 
minimizer of T: 

Lemma 2.2 Suppose that all functions Ri are differentiable on R. Then a vec- 
tor f € R" minimizes T if, and only if, for 1 < k < n, 

k ( e [-A fe ,Afc], 

^R'iifi) \ = x k if k < n and f k < fk +1 , (4) 
i=l [ = -A fc ifk<n and f k > f k+1 . 

For k = n, it follows from A„ = that Condition (4) amounts to 

n 
i=l 

Note that in the classical case, Ri(z) = (z — Yi) 2 /2 and 5Zj=i^(/i) =Efci/i~ 
X),=i ^t- Thus our result entails Mammen and van de Geer's (1997) finding that 
the solution / may be represented as the derivative of a taut string connecting 
the points (0, 0) and (n, X)"=i Yi) an d forced to lie within a tube centered at 

the points (k, J2i=i 1 < ^ < n - 

In the general setting treated here, there are no longer taut strings, but the 
solutions can still be characterized by a tube. This is illustrated in the left 
panels of Figure 1 with a small example. The upper panel shows a data set of 
size n = 25 (with x% = i) and the approximation / obtained from the functional 
T in (1) with p(z) := V0.1 2 + z 2 and A = 2. This function p(r) may be viewed 
as a smoothed version of \z\ with 



V0.1 2 + (z - Y t ) 2 

being similar to sign(z — Yi). The lower panel shows the cumulative sums of the 
"residuals" i?^(/,). As predicted by Lemma 2.2, these sums are always between 
—A and A, and they touch these boundaries whenever the value of / changes. 



2. 3. Bounding the range of the solutions 

Sometimes it is helpful to know a priori some bounds for any minimizer / of T . 
We start with a rather obvious fact: Suppose that there are numbers Z[ < z r 
such that for i = 1, . . . , n, 

R' l (z+)<0 if z < Z£, 
Ri(z-) > if z > z r . 

Then any minimizer of T belongs to [ze,z r } n - For if / G R" \ [zi,z r ] n , one 
can easily verify that replacing / with (min(max(/i, zg), z r )) i _ 1 yields a strictly 
smaller value of T(f). In case of differentiable functions Ri an even stronger 
statement is true: 



L. Diimbgen and A. Kovac/Extensions of smoothing via taut strings 47 




5 10 15 20 25 5 10 15 20 25 



Fig 1. Illustration of Lemma 2.2 and Conditions (C.Ik) and (C.2f^ g ^fc) of Algorithm I. 

Lemma 2.3 Suppose that the functions Ri are differentiable such that for cer- 
tain numbers Z{ < z r , 

R'i( z i) < for all i with at least one strict inequality, 
R'i(z r ) > for all i with at least one strict inequality. 

Then any minimizer of T is contained in (z£, z r ) n . 

In the special case of Ri(z) = p(z — Yi) we reach the following conclusions: If 
is the unique minimizer of p over R, then any minimizer of T belongs to 

[min(Yi, . . . , Y n ), max(Yi, . . . ,F„)]". 

If in addition p is differentiable and Y = is non-constant, then any 

minimizer of T belongs even to 

(min(Yi, . . . , Y n ), max(Yi, . . ■ ,Y n )) n . 
2-4- A link to monotone regression 

An interesting alternative to smoothness assumptions and roughness penalties 
is to assume monotonicity of the underlying regression function / on certain 
intervals. For instance, if / is assumed to be isotonic, one could determine an 
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estimator / minimizing J27=i Ri{fi) under the constraint that f\ < fa < • • • < 
f n . The next theorem shows that our penalized estimators / often coincide 
locally with monotone estimators. 

Theorem 2.4 Suppose that 1 < a < b < n such that 

A a -i = A a = • ■ ■ = A b and / a _i < f a < ■ ■ ■ < f b < f b+1 . 

Then {fi) b i=a minimizes ^2,i =a Ri{fi) among all vectors (fi)\ =a satisfying f a < 
■ • • /ft- An analogous statement holds for antitonic fits. 

3. Computation in case of regular functions Ri 

3.1. A general algorithm for strongly coercive functions Ri 

In this subsection we present an algorithm for the minimization of the functional 
T in case of (A.l) together with the additional constraint that 

(A. 3) all functions Ri are continuously differentiable with 

lim R'i(z) = — oo and lim R'i(z) = oo. 



z — > — oo 



Obviously the latter condition is satisfied in Example I. Note that Conditions 
(A.l) and (A. 3) imply Condition (A. 2). Moreover, R • : R — » M is isotonic (i.e. 
non-decreasing), continuous and surjective. 

The algorithm's principle. The idea of our algorithm is to compute induc- 
tively for K = 1,2, ...,7i a vector (fi)f =1 such that Condition (4) holds for 
1 < k < K, where fx+i may be defined arbitrarily. 

Precisely, inductively for K = 1,2, ... ,n we compute two candidate vectors 
/ = {fi)f=i an d g = {gi)fLi in such that at the end of step K the following 
three conditions are satisfied: 

{C.Ik) There exists an index k D — k Q (f,g) e {0, 1, ... , K} such that 



9i - f i 



= for 1 < i < k , 
> for k a < i < K. 



Moreover, fi is antitonic (i.e. non-increasing) and gi is isotonic in i G {fc D 
1,...,K}. 

(C.2 g , K ) For 1 < k < K, 



Rj(gi) < \k with equality if 

i=i 

(C.2 f:K ) Forl<k<K, 

k 

^^R'i(fi) > — Afc with equality if 

i=l 



k < K and g k < gk+i, 
k = K. 



k < K and f k > fk+i, 
k = K. 



L. Diimbgen and A. Kovac/Extensions of smoothing via taut strings 49 



Note that Conditions (C.lif) and (C.2/& Sj #-) imply the following fact: 
(C.3 K ) Let A = A (/, g) := Ei=i WO- K 1 < < then either 

fk =9k > 9k +i > fk +i and A D = -A feo , 

or 

9k = fk < /fc +i < .9fe +i and A Q = X ko . 

When the algorithm finishes with K = n, a solution / £ R" may be obtained 
as follows: If fc = n, f := / = g satisfies the conditions of Lemma 2.2. If 
fc„ < n, we define 




fi = gi for 1 < i < k Q 
r for k a < i < n 



with some number r € [/fc +i, <?fc +i]- This definition entails that fi < fi < gi 
for all indices i, while fi is constant in i > k Q . Hence one can easily deduce from 
Conditions (C.2/& 9 n ) that / satisfies the conditions of Lemma 2.2. To ensure 
a certain optimality property described later, in case of 1 < k a < n we choose 

. \9k a +i if fk = 9k a > 9h,+i, 

\fk + l if 9k a = fk a < /fe +l- 

Conditions (C.lif) and (C.2/& 9j if ) are illustrated in the right panels of Fig- 
ure 1 which show the same example as the left panels that were discussed in the 
previous Section. Strictly speaking, Condition (A. 3) is not satisfied here, but one 
can enforce it by adding min(z — ci, 0) 2 + max(z — C2, 0) 2 to Ri(z) = p(z — Yi) 
with arbitrary constants cj < min(yi, . . . , Y n ) and C2 > max(Yi, . . . , Y n ). This 
modification does not alter the solution which is contained in [ci, C2]™; see Sec- 
tion 2.3. 

The solid and grey lines in the upper panel represent / and g, respectively, 
where K = 20 and k a = 3. The lower panel shows the corresponding cumulative 
sums of the numbers and R' i (gi) 1 1 < i < K. 

Some auxiliary functions and terminology. Later on we shall work with 
partitions of the set {1,2, ...,n} into index intervals and functions (vectors) 
which are constant on these intervals. To define the latter vectors efficiently we 
define 

k 

R'jk ■= ^2 R i 

for indices 1 < j < k < n. Again, R'- k is continuous and isotonic on K with 
limits R'j k (±oo) = ±00. Further, for real numbers t let 

M jk (t) := min{z e K : R' jk {z) > t}, 
Mjk(t) ■■= max{z G K : R' jk (z) < t}. 



L. Diimbgen and A. Kovac/Extensions of smoothing via taut strings 50 

These quantities M ^.(t) and Mjk(t) are isotonic in t, where R'j k (z) = t for any 
real number z G [M jk (t),Mjk(tj] . 

The following lemma summarizes basic properties of the auxiliary functions 
Mjk which are essential for Algorithm I below. The functions M - fc satisfy anal- 
ogous properties. 

Lemma 3.1 Let l<j<k<£<m<n be indices with £ = k + 1. Further let 
c,n,Del such that 

R' jk (c) = u. 

(a) If c> Mi m {v), then 

c > M jm (u + v) > M lm (v). 

(b) If c > Mj m (u + v), then 

Mjm(u + V) > Mi m (v). 

For some vector v = («j)£_ a , a maximal index interval J C {a, . . . , b} such 
that Vi is constant in i 6 J will be called a "segment of d" . 

Algorithm I: Step 1. For K = 1 we define 

h ■= Mi.i(-Ai) and gi := Mi,i(Ai). 
Conditions (C.li) and (C.2/& 9) i) are certainly satisfied. 

Algorithm I: Step K + 1. Suppose that Conditions (C.Ik) and (C.2/& ffi if) 
are satisfied for some K 6 {1, . . . ,n — 1}. Since Ax > 0, one can easily derive 
from (C.2f&g t K) that k < K. Subsequently we construct new candidates / = 
(fi)?=f for / and g = {g^ 1 for g. In this context, (C.ljf +1 ) and (C.2... >K+1 ) 
always denote conditions on the new vectors / and g in place of / and g, 
respectively, while (G.Ik) and (C.2...^) refer to the original / and g. 

Initializing g. We set 

)9i for i < K, 

gi := < 

[M k +i.k+i(Xk+i - Xk) fori = K+l. 

Since gi = gi for all i < K, one can easily verify that the inequality part of 
(C.2 g ^+i) is satisfied, and also X^Sl 1 -^i(Si) = ^k+i- 

Modifying g. Suppose that gi is not isotonic in i > k a . By the previous 
construction of g, this means that the two rightmost segments {j, . . . , k} and 
{£,..., K + 1} of (gi)f=k 0+ i satisfy g k > ge, where k Q < j < k = £ - 1 < K. In 
that case we replace gi, i € {j, . . . ,K + 1}, with 

Mj iK +i(^K+i - Aj-i) if j > k Q + 1, 
~Mj,K+i (Art+i — A ) ifj = fc + l. 
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This step is repeated, if necessary, until gi is isotonic in i > k Q . One can easily de- 
duce from Part (a) of Lemma 3.1 below that throughout gi < gi for k Q < i < K 
while X^Sl 1 R'iidi) = <Vk"+i- Hence the inequality part of Condition (C.2 g k +i) 
continues to hold. The equality statements of Condition (C.2 gt K+i) are true 
as well, the only possible exception being k — k Q . This exceptional case may 
occur only if gk +i < 9k +i, and by our construction of g, this entails gi being 
constant in i £ {k + 1, . . . , K + 1}. 

Initializing /. We set 

, = [fi fori<#, 
|lif+i,if+i(-^+i + -^A') for i = K + 1. 

Again the inequality part of (C. 2/^+1) is satisfied, and also ^fj^ 1 R'i(fi) — 
—Xk+i- 

Modifying /. Suppose that fi is not antitonic in i > k a . This means that the 
two rightmost segments {j, . . . , k} and {£,..., K +1} of {fi)fj' k 1 +1 satisfy fa < 

ft, where k <j<k = £ — l<K. In that case we replace /j, i € {j, . . . , K+l}, 
with 

M jt K+i{-^K+i + Xj-i) iij>k + l, 
M-j, K +i(-XK+i - A ) ifj = fc + l. 

This step is repeated, if necessary, until fi is antitonic in i > k Q . Throughout, 
fi > fi for k Q < i < K while Y^f=i R'i(fi) = — ^K+i' Hence the inequality part 
of Condition (C.2f : K+i) continues to be satisfied. The equality statements of 
Condition (C.2f^+i) are true as well, the only possible exception being k = k a . 
This exceptional case may occur only if fk„+i > fk +i, and this entails that fi 
is constant in i > k a . 

This step is illustrated in Figure 2 which shows again the example from 
Figure 1. Here K = 17 and the panels show from left to right the initialisation 
of /, the first modification of / and the second and final modification. The 
panels in the upper row show the data and the approximations while the panels 
in the lower row show the corresponding cumulative sums of the numbers R^(fi) 
and R^gi), l<i<K. 

Final modification of / and g. Having completed the previous construc- 
tion, we end up with vectors / and g satisfying the inequality parts of Con- 
ditions (C.2/fc ai jr+i). The equality parts arc satisfied as well, with possible 
exceptions only for k = k Q (f,g). Moreover, fi is antitonic and cji is isotonic in 
i > k . Finally, our explicit construction entails that 

/fc +i = fk +i or fk a +l < fk„+i = ■ ■ ■ = fx+i, 

and 

9k +i = 9k +i or g ko +i > g ko +i = ■ ■ ■ = gx+i- 



L. Diimbgen and A. Kovac/Extensions of smoothing via taut strings 52 




Suppose first that ,fk a +i _• 9k a +i- Then one can easily deduce from (C.3k) 
and the properties of /, g just mentioned that Conditions (C.1k+i) and 
(C2f^ g ^K+i) are satisfied. 

One particular instance of the previous situation is that both /j and <ji are 
constant in i > k Q . For then 

A'+l 

-Xk+x = = A + ^ o+liif+1 (/ fco+1 ) 

and 

K+l 

Xk+1 = }~] Rj(9i) = ^o + R'k +l,K+l(9k +l )■ 

1=1 

Now suppose that fk a +i > 9k +i- The previous considerations and our con- 
struction of / and g show that cither 

Jk+1 < fk + l = fk + l > gk a + l, 

or 

gn+1 > 9k +i = 9k +\ < fk +i- 
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We discuss only the former case, the latter case being handled analogously. Here 
Condition (C. 2f ! K+i) is satisfied already. Let {k a + 1, . . . ,ki} be the leftmost 
segment of (fi)*Lf o+v Then we redefine g as follows: 

for i < k , 
for k < i < k\, 
for ki < i < K + 1. 

By assumption, R' ko+1M {h +i) = -A^ - A D and M k +i,K+i{^K+i - A„) < 
fk +\- Hence Part (b) of Lemma 3.1 entails that the new value of gki+i is not 
greater than M k +i,K+i{^K+i — A Q ), which is the old value of gk„+i = ■ ■ • = 
gx+i- Since fk +i = fk a +i < gk a +i, we may conclude that the new vector g 
still satisfies gi < gi for k a < i < K . Now one easily verifies that the new vector 
g satisfies Condition (C.2 fli if + i). 

It may happen that Condition (C.\k+i) is still violated, i.e. gk t +i < fk x +i- 
In that case we repeat the previous update of g with k\ in place of k D and iterate 
this procedure until Condition (C.lif+i) is satisfied as well. 

This step is illustrated in Figure 3 which shows once more the example from 
Figure 1 and 2. Here K = 23 and the left panels show / before the final modifi- 
cation while the result of the modification is shown in the right panel. As always 



!fi = 9i 
fk a + l 
M kx+i^+ii^K+i + AfcJ 




10 20 30 40 50 10 20 30 40 50 




10 20 30 40 50 10 20 30 40 50 



Fig 3. Illustration of the final modification step for f in Algorithm I. 
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the panels in the upper row show the data and the approximations while the 
panels in the lower row show the corresponding cumulative sums of the numbers 
R' t {fi) and Rtfa), l<i<K. 

An optimality property of Algorithm I. The solution / produced by 
Algorithm I is as simple as possible in a certain sense. For a vector / £ M. n , an 
index interval {j, . . . , k} C {1, 2, . . . ,n} with j > 1 or k < n is called a 

local maximum of f if /.• = •••= ft > max h , 
local minimum of / if /,=••• = < min /j , 

where M := {j - 1, k + 1} n {1, 2, ... , n}. 

Theorem 3.2 Let f be the vector produced by Algorithm L, and let f be any 

vector in R" such that 



!=1 



< A fe for 1 < k < n. (5) 



max f L > max ft for anw loca/ maximum J of f ' 
min /i < mm /i / or arl ?/ ^oca/ minimum J of f. 

In particular the theorem shows that every vector / satisfying the tube con- 
dition (5) must have at least the same number of local extreme values as /. 



3.2. Exponential families 

Examples III and IV may be generalized as follows: Suppose that Yj has dis- 
tribution Pf(xj) for some unknown real parameter f(xj), where (Pe)ggR is an 
exponential family with 

^(y) = exp(6y-b(e)) 

for some measure v on the real line. 

In case of Poisson regression, v is a discrete measure on {0, 1,2,.. .} with 
weights v{{k}) = l/k\, so that b(8) = b'(6) = b"(6) = e 6 . For binomial regression 
we choose v to be counting measure on {0,1}, whence b(9) = log(l + e e ). 
b'{6) = e e /(l + e B ) and b"{6) = b'{9){\ - b'{6)). 

Note that in general, b(-) is infinitely often diffcrcntiable with b'(9) and b"(6) 
being the mean and variance, respectively, of the distribution Pq. We assume 
that b"(9) is strictly positive for all 9 € R. Note also that ^(R\ [y m i n , 2/max]) = 0, 
where y m i n and j/ max are the infimum and supremum, respectively, of the set 
{b'{9) : 9 e R}. 
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Now we consider minimization of minus the log-likelihood function plus the 
localized total variation penalty, i.e. 

Ri(z) := b(z)-zY l . 

By assumption, these functions arc infinitely often diffcrcntiable with first deriva- 
tive R'i(z) = b'(z) — Yi and strictly positive second derivative R"{z) = b"(z). 
Hence they satisfy Condition (A.l). Unfortunately they may fail to be coercive 
individually, and Condition (A. 3) is not satisfied in general. However, one can 
show that Condition (A. 2) is satisfied as soon as Y is non-trivial in the sense 
that 

min(Yi,...,Y„) < max(y 1 , . . . , Y n ). (6) 

We will not pursue this here, because there is a simple solution of our estimation 
problem, at least in case of (6): 

At first we apply the usual taut string method to the observations Yi, i.e. we 
replace Ri(z) with p{z — Yi), where p(z) := z 2 /2. Let / be the resulting least 
squares fit. It follows from Lemma 2.3 that 

{A LS ,...,/£ S } C (min(y 1 ,...,y„),max(Y 1 ,...,y„)). 
Thus the vector / with components 

fi ■■= (b'rHfV) 

is well-defined and satisfies J2i=i R'i(fi) = Y^i=i P'{ff S ~ Hence applying 

Lemma 2.2 to / in the least squares setting entails the analogous conditions 
for / in the maximum likelihood setting. This shows that / is indeed the unique 
minimizer of T. 

3.3. Ties among the Xi 

For simplicity we assumed that xi < X2 < ■ • • < x n . Now we relax this assump- 
tion temporarily to x\ < xi < ■ ■ ■ < x n and describe a possible modification of 
Algorithm I. 

Let < X( 2 ) < • • • < X( m ) be the distinct elements of {xi, xi, ■ ■ ■ , x n }, and 
let i(k) := max{i : Xi = xn^}. Then we restrict our attention to vectors / G W" 1 
such that fi = fj for i(k — 1) < i < j < i(k), 1 < k < m, where z(0) := 0. The 
target functional has to be rewritten as 

n in — 1 

T(f) = ^(/o + Hkk+D - fml 

i=l k=l 

Now Algorithm I uses induction on K = 1, 2, . . . , m. In Step 1 we define 
fi : = Mi,i(i)(-Ai) and gi := M lji (i)(Ai) for 1 < i < 
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In Step K + 1 we aim for vectors / and g in W^ K+1 \ The initial versions are 
given by 

j 9i for i < i{K), 

[M i{K)+1AK+1) (+X K+1 - X K ) for i[K) < i < i(K+l), 

- _ = (fi for i<i(K), 

\m, {k)+1ak+1) {-X k+1 + X K ) for i(K) <i<i(K + l), 

while the remainder of Step K + 1 remains unchanged. 



4. The case of arbitrary functions Ri 

In this Section we describe how to calculate solutions to the general setting 
described in Section 2. In particular we investigate how to solve the quantilc 
regression problem presented in Example II. Throughout this section we assume 
that Conditions (A. 1-2) are satisfied, while some of the functions Ri may fail 
to satisfy the regularity condition (A. 3). 

4-1- Approximating the Ri 

We start with a general observation: For e > and i G {1, 2, . . . , } let Ri :€ : 
M. — > K be a (data-driven) convex function such that 

limi? ij£ (z) = Ri(z) for any z 6 K. (7) 

The corresponding approximation T e {f) = T\^{f) to T{f) equals X)™=i Ri,e(fi)+ 
Y]j—i Aj I/j+i — fj \ and has the following properties: 

Theorem 4.1 For sufficiently small e > 0, i/ie se£ JF e := argmin^ 6ffi „ T £ (f) is 
nonvoid and compact. Moreover, it approximates the set T := argmin^ gK „ T(f) 
in the sense that 

max min \\f e — f\\ as e [ 0. 

If we find approximations Ri^ e satisfying additionally Condition (A. 3), we 
can minimize the target functional T(-) at least approximately by means of 
Algorithm I. One possible definition of such functions Ri iE is given by 

1 [ z+ z max(z-l/ £ ,0) 2 min(z + l/ g ,0) 2 
Ri,e(z) := Y e J Ri(t)dt + + . 



Here one can easily verify (7) and Condition (A. 3) with i?i i£ in place of Ri. 
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4-2. A non-iterative solution for Example II 

Apparently the preceding considerations lead to an iterative procedure for min- 
imizing T. But such a detour is not always necessary. In this section we de- 
rive an explicit combinatorial algorithm for Example II. Recall that for given 

Y = (Yi)f =1 , our goal is to minimize T(f) with Ri(z) := pp(z — Yj). Note first 
that 

R'i(z+) = l{Yi <z}-0 and iJ-(z-) = l{Yi < z} - 0. 

This indicates already that for quantile regression mainly the ranks of the vector 

Y matter. Precisely, we shall work with a permutation Z = (Zi)^ =1 of 
such that 



#{i : Y t < Yj} + 1 < Zj < #{i : Y t < Y,} for 1 < j < n. 



(8) 



That means, Z is a rank vector of Y but without the usual modification in 
case of ties. The usefulness of this will become clear later. Solving the original 
problem with Z in place of Y would not be much easier. But now we replace 
pp(z — Zi) with a smooth function Ri(z) such that 



z-0 

-0 
z - Zi 
1-0 

z-n+1- 



+ 1-0 



if z < 0, 

if < z < Zi - 1, 
if Zi — 1 < z < Zi 
if Zi < z < n, 
if z > n: 



see also Figure 4. The idea behind Ri is to replace pp{z — Zi) with Pp{z 
i) dt, which would result in the derivative min(max(z — Zi + 1 — 0, —0), 1 — 0). 
The extra modifications on (— 00, 0) and (n, 00) are just to ensure the strong 
coercivity part of Condition (A. 3). Thus we propose to minimize 



f{g) := Y,IU{Si 



n-l 

E 



X j \9j- 



9j\ 



by means of Algorithm I and then to utilize the following result. 



Theorem 4.2 Suppose that g minimizes T over . 
Furthermore, let f G K n be given by 



Then g € (0,n-l + 0) n . 



fi ■= y ( 

with the order statistics Yn) < Y(o) < • • 
smallest integer not smaller than a € 



m) 

■ ■ < Yr n \ of Y and \a~\ denoting the 
Then f minimizes T over K™ . 
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Fig 4. The derivative R'.. 



Algorithm II. Summarizing the preceding findings, we may compute the 
estimated /3-quantile function by means of Algorithm I, applied to the functions 
R[ in place of R^. Let us just comment on the corresponding auxiliary functions 

M jk (t) := mm{z £ E : R' jk (z) > t}, 

Mjk(t) := max{z £ R : R' jk (z) < t}. 



If Z(x) < Z( 2 ) < 



(2) 



< zti) are the £ 



if z 



£mm(z,0)-£(3 
i + z — zu\ — £/3 
i-£/3 

£(1 — (3 + max(z — n, 0)) if z > Zm 



j + 1 order statistics of then 



if Z < Zn) — 1 



(1) 

1 < z < z (i ), l<i<£, 



if < 2 < Z(i+i) 



This entails that 



M jk (t) 



t/£ + (3 i£t<-£p, 

z {i) + t-i + £(3 if i - 1 - £j3 < t < i - £(3, 1 < i < £, 

n + t/l-l+fi i£t>t(l-0), 

't/£ + P i£t<-£0, 

Z(i) + t-i + £/3 if i - 1 - £p < t < i - £/3, 1 < % < £, 

n + t/£-l+j3 ift>£(l-0). 
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To implement this algorithm efficiently, one should use two additional vector 
variables Z^' and Z^ such that for each segment {j, . . . , k} of / (resp. g), 

(Zft)![ = j (resp. (Z^)!? = j) contains the order statistics of •. Whenever 

two segments of / (resp. g) are merged, the vector (resp. Z^) may be 

updated by a suitable version of MergeSort (Knuth, 1998). 



5. The choice of tuning parameters Xj 
5. 1 . Constant and fixed X 



Let us first discuss the simple case of a constant value A > for all Xj. In 
Example I, let a be some consistent estimator of the standard deviation of the 
variables Yi, assuming temporarily homoscedastic errors Y% — n(xi). For instance, 
a could be the estimator proposed by Rice (1984) or the version based on the 
MAD by Donoho et al. (1995). Since R'i(z) = z — Yi, and since for large n the 
process 

[0,l]9t » n- 1 ' 2 &- l l 2 Y,(^)-Y l ) 

i<nt 

behaves similarly as a standard Brownian motion by virtue of Donsker's invari- 
ance principle, one could use 



A 



cn 1 ' 2 ? 



for some constant c > 0. In our experience with simulated data, a value of c 
within [0.15,0.25] yielded often satisfying results. 

In Example II, note that the data Yi may be coupled with independent 
Bernoulli random variables € {0, 1} with mean (3 such that 



£(&-/3) 



i=.i 



k k 

< Y^B^Q fi {xi) +) = ^2(l{Yi < Q ( Xi )} - f3 



i=.i 



i=.i 



(9) 



for 1 < j < k < n. Since t h-> ? i- 1 /2( /3 ( 1 _ /3))-l/2 J2i< n M ~ I 3 ) behaves 
asymptotically like a standard Brownian motion, too, we propose 

A = cn^m-P)) 1 ' 2 . 



5.2. Adaptive choice of the Xj 

Let / be the unknown underlying regression function. Our goal is to find a 
"simple" vector (function) / which is adequate for the data in the sense that the 
deviations between the data and / satisfy a multircsolution criterion (Davies and 
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Kovac, 2001) where we require the deviations between data and / at different 
scales and locations to be no larger than we would expect from noise. More 
precisely we require for each interval {j, . . . , k} from a collection T n of index 
intervals in {1, . . . , n} that 

k k 

> ■••./*) and H R 'iCfi-) < v(fj,- ■ ■>/*)■ (io) 

The bounds ry(-) < < rj(-) to be specified later will be chosen such that 

the inequalities above are satisfied with high probability in case of replacing / 
with the true regression function /. A typical choice for X n is the family of all 
n(n + l)/2 such intervals {j, . . . , k}. Computational complexity can be reduced 
by considering a smaller collection such as the family of all intervals with dyadic 
endpoints, 

{2^m + l,...,2 £ (m + l)}, 

where < I < [log 2 n\ and < m < [2~ e (n — 1)J. The difference in compu- 
tational speed between these two choices for T n is easily noticeable in practice. 
The effect on the resulting approximation /, however, is rather small. If a vector 
g does not approximate the data well on some interval J which is not part of 
the scheme with the dyadic endpoints. then occasionally the multiscale criterion 
using the dyadic endpoints will consider g to be adequate where the multiscale 
criterion which makes use of all subintervals will notice the lack of fit. Since this 
effect is barely noticeable we prefer to use smaller collections such as the family 
with dyadic endpoints which was also used in the simulation study in Section 6. 

To obtain a vector / satisfying (10) for all intervals in I ra , we propose 
an iterative method for the data-driven choice of the tuning parameters Aj. 
This approach generalizes the local squeezing technique from Davies and Kovac 
(2001) to our general setting. We start with some constant tuning vector A^ = 

(A, A . . . , A) where A is chosen so large that the corresponding fit / ' is con- 
stant. Now suppose that we have already chosen tuning vectors A^, . . . , \^ s \ 

and the corresponding fits are denoted by / ' , . . . , / ^ . If ^ is still inade- 
quate for the data, we define to be the union of all intervals {j — 1, j, . . . , k} 

such that {j, ...,&} is an interval in X n violating (10) with f = f . Then for 
some fixed 7 € (0, 1), e.g. 7 = 0.9, we define 



A 



One can easily derive from (3) that for sufficiently large s the fit / = / docs 
satisfy (10) for all {j, . . . , k} 6l„. 

Example I (continued). In this case the multiresolution criterion (10) in- 
spects the sums of residuals on all intervals l£l n . If we assume additive and 
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homoscedastic Gaussian white noise, possible choices for rj(-) and ry(-) are 
f}(f j ,...J k ) = -r L (f j ,...,f k ) := a^Jk - j + 1 ■ v / 2blH, 

nU[, A) ojfj A) ■■= o^Jk - j + 1 ■ U~2 i og ( k _ e J + 1 ) + c) 

for some c > 0. The first proposal coincides exactly with the local squeezing 
technique by Davies and Kovac (200f). The second one is motivated by results 
of Diimbgen and Spokoiny (2001). 

Example II (continued). If we assume that Yi = fi+£i where the e±, . . . ,e n 
are independent with /3-quantile and continuous distribution function, then 

both T,i=j( R iifi +) + I 3 ) and T,i=j( R i(fi-) + P) arc binomially distributed 
with parameters k — j + 1 and /?. Let B(x; N,p) be the distribution function of a 
binomial distribution with parameters N and p. Then we define rj(fj, ■ . . , fk) = 
h(k — j + 1) minimal and r](fj, . . . , fk) = h(k — j + 1) maximal such that 

B((k-j + l)(3 + h(k-j + l);k-j + l,0) > X-rT 1 and 
B((k-j + l)P + h(k-j + l);k-j + l,P) < n-\ 

Example III (continued). We assume that for each Yi is Poisson distributed 
with parameter exp(/j). Then £\ =j Y i = S,=j ex P(/i) - Z)»=j R i(fi) is a S ain 
Poisson distributed with parameter J2i=j ex P(/j)- With P(-;£) denoting the 
distribution function of the Poisson distribution with parameter £, we define 
V(fj, ■ ■ ■ , fk) to be h(J2i=j cx P(/»)) and Vifj, ■■-,3k)= h(Yli=j ex P(/*)) ; where 
h(£) is maximal and h{£) is minimal such that 

P(£-h(£);£) > 1-n- 1 and P(£-h(£);£) < vT 1 . 

Example IV (continued). Suppose that Yi, . . . , Y n arc binomially distributed 
with parameters 1 andp^ = exp(/j)/(l+exp(/j)). Then Y^.j—j Y m ay be written 
as Yli=jPi ~ J2i=j R i(fi)- Following Hoeffding's (1956) finding that the devi- 
ations of y^j—j Y from its mean 2j=j Pi tend to be largest in case of equal 
probabilities p u we define rj(fj, ...,fk) = h{k - j + l,pjk) and ^(fj, ...,j k ) = 
h(k — j + l,pjk), where pjk denotes the mean of Pj, ■ ■ ■ ,Pk while h{N,p) is 
maximal and h(N, p) is minimal such that 

B(Np-h(N,p);N,p) > 1-n" 1 and B(Np - h(N,p); N,p) < n" 1 . 

For the consistency results to follow, it is crucial that the adaptive choice of 
the Xi yields a fit / such that for some constant c Q , 

fe 

±J2 R 'i(fiT) < (c (fc-j + l)logn) 1/2 + c logn for aU 1 < j < k < n. (11) 

i=j 

For example I this is obvious, at least if I n comprises all subintervals of {1, . . . , n}. 
By means of suitable exponential inequalities one can verify the multiscale cri- 
terion (11) for Examples II-IV, too. 
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Fig 5. Rescaled versions of standard test signals by Donoho and Johnstone. 



6. Numerical examples 

A simulation study was carried out to compare the median number of local 
extreme values for nine different versions of the general taut string method. 
Figure 5 shows rescaled versions of four standard test signals by Donoho (1993) 
and Donoho and Johnstone (1994) that have been used to create samples under 
four different test beds as described in detail below. For each function / and 
sample size n the following test beds were considered: 

• Gaussian: Independent normal observations 



Yi 



Af(f(i/n),0A), i = l,...,n 



were generated. The usual taut string method and the quantile version 
with P — 0.5 were applied to recover / and the quantile version with 
(3 = 0.1 and j3 = 0.9 was used to recover the 0.1- and 0.9-quantile curves 
of the data which are approximately / — 0.513 and / + 0.513. 
• Cauchy: Similarly Cauchy observations were generated by 



Yi 



C(f(i/n),0A), * = 1, 



where C(l,s) denotes the Cauchy distribution with location I and scale s 
having density function 

p(y) = (^(l + ay-O/a) 3 )) -1 
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Since the mean of the Cauchy distribution does not exist, only the quantile 
taut string was applied with quantiles 0.1, 0.5 and 0.9 to recover /— 1.231, 
/ and / + 1.231. 

• Binary: Binary observations were obtained by sampling from a Binomial 
distribution: 

Yi ~ Bin(l,pi), pi = (f(i/n)-a)/(b-a), i = l,...,n 

with b = max tg r 01 i f(t) and a = min tg [ 01 ] f(t). Then the taut string 
method for Binary data was used to recover pi. 

• Poisson: Finally, Poisson data were derived by 

Yi ~ Poisson(Z.;), li = f(i/n) — a, i = 1, . . . ,n 

with a = min tg [ .i] /(£)■ Then the taut string method for Poisson data 
was applied to recover li. 

Typical approximations from samples of size 2048 for the Blocks and the 
Doppler signals can be seen in Figure 6 and Figure 7. In each Figure the first row 
illustrates the Gaussian testbed with the usual taut string method in the left and 
the quantile versions in the right panel. The robust method which corresponds 
to the 0.5 quantile is plotted in grey, the other quantiles are plotted in black. 
The Cauchy data arc shown in the second row. The left panel demonstrates 
the huge range of the observations which lie between -137 and 12383. The right 
panel shows a zoom in and approximation to the quantiles where again the 0.5 
quantile is plotted in grey. Finally the last row shows binary and Poisson data. 

For each of the four signals, three different sample sizes and each of the four 
test models 100 samples were generated and the various taut string methods 
were applied to the data as described above in the description of the test beds. 
For each application of one of the methods to a sample the number of local ex- 
treme values in the approximation was determined. Table 1 reports the median 
number of local extreme values over the simulations. In brackets the mean abso- 
lute deviation from the true number of local extreme values is given apart from 
the samples derived from the Doppler function which has an infinite number of 
local extreme values. 

These simulations confirm that the usual taut string method is excellent in 
fitting the correct number of local extreme values and very reliably attains the 
correct number of local extreme values already for samples of size 512. However, 
the robust version performs remarkably well in the Gaussian case and has the 
additional advantage that it depends much less on the distribution of the errors 
and performs similarly in the Cauchy test bed. In contrast the approximation 
of 0.1 and 0.9 quantiles is much more difficult. Even for large sample sizes the 
fitted models often miss local extreme values, in particular for the 0.1 quantile 
of the Bumps data set which is an extremely difficult situation. The binary 
problem also appears to be considerably difficult, although still much of the 
underlying structure is recovered using the 0/1 observations. For the Poisson 
case the detection rate of the correct number of local extreme values is nearly 
as good as the robust taut string. 
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Fig 6. Typical approximations to the Blocks signal. First row: Approximations to Gaussian 
data using usual taut string method and quantile versions. Second row: Approximations to 
Cauchy data, original scale left, zoom in right, Last row: Approximations to binary and 
Poisson data. 
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Fig 7. Typical approximations to the Doppler signal. First row: Approximations to Gaussian 
data using usual taut string method and quantile versions. Second row: Approximations to 
Cauchy data, original scale left, zoom in right, Last row: Approximations to binary and 
Poisson data. 
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Table 1 

Comparison of the median number of local extreme values for nine different versions of the 
general taut string method, as described in the text. In brackets the mean absolute deviation 
from the true number of local extreme values. 
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Zi 
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9 


9 


9 


5 


9 


6 


5 


9 


9 






(0.2) 


(0.0) 


(0.0) 


(3.5) 


(0.0) 


(3.4) 


(4.1) 


(0.6) 


(0.0) 


Bumps 


512 


21 


5 





7 


3 





1 


1 


13 


(true: 21) 




(0.0) 


(16.4) 


(21.0) 


(15.0) 


(18.4) 


(21.0) 


(19.1) 


(19.8) 


(6.9) 




2048 


21 


13 


3 


11 


9 





9 


7 


21 






(0.0) 


(8.4) 


(18.7) 


(9.2) 


(11.5) 


(20.9) 


(11.4) 


(13.3) 


(0.4) 




8192 


21 


21 


9 


21 


21 


2 


19 


13 


21 






(0.1) 


(0.0) 


(11.2) 


(0.0) 


(0.0) 


(18.8) 


(2.6) 


(7.8) 


(0.0) 



7. Consistency 

In this section we derive consistency results for fitted regression functions / on 
certain intervals on which the fit is monotone while the true regression function 
/» satisfies a Holder condition. 

Precisely, we consider a triangular scheme of observations {x%, Yi) — (xj n , Yi n ) 
and auxiliary functions Ri = Ri n . Asymptotic statements refer usually ton-t 
oo, and we use the abbreviation 

logn 
Pn ■= • 

n 

Throughout let [A, B] be a fixed nondegenerate interval such that the following 
conditions are satisfied: 

(C.l) There are constants m ,m* > such that the design measure M n := 
S"=i $E in satisfies 

M n [a,b] 
n(o — a) 

for sufficiently large n and all A < a < b < B with (b — a) > m r p n . 
(C.2) For arbitrary indices i and real numbers t, 

p' m (t±) := ER! in (t±) 
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exist. Moreover, there exists a nondecreasing function H : [0, oo) — * [0, oo) with 
-ff(O) = and h := liminf t ^o H(t)/t > such that for all s > and indices i 
with Xi n € [A, B] , 

±p' in {(U{x m )±s)±) > H(s). 
(C.3) For real numbers a < b and t define 

A<±)(a,M) := E ( R in-Pin){(f*(Xin)+t)±)- 

i : a < x" i n < 6 

Then there exist constants K\,K<z > such that for all A < a < b < B and 

T)>0, 

p(supAW(a,M) 2 > M n [a,b]rj\ < K lG xp(-K 2 r,). 

Let us comment briefly on these conditions: Condition (C.l) is satisfied if, 
for instance, all design points Xi n are contained in [A, B\ with ajj+i )n — Xi n = 
(B — A)/n for 1 < i < n. It also holds true almost surely if (£m)™ =1 is the vector 
of order statistics of (Xi)™ =1 , where Xi, X 2 , X$, . . . are i.i.d. with a Lebesgue 
density g which is bounded away from zero on [^4, B]. 

As to Conditions (C. 2-3), consider first R in (t) := {t-Y ln ) 2 / 2. Then R' ln (t±) 
i — Yi n and p' in (t±) = t — f*(xi n ). Thus Condition (C.2) is satisfied with 
c = 1, and Condition (C.3) amounts to the errors e,,, := Yi„ — (i(xi n ) hav- 
ing subgaussian tails uniformly in xi„ G [A, B]. In case of Ri n {t) ■= pp(t — 
Yi n ), suppose for simplicity that all distribution functions F(- \ x) are contin- 
uous. Then R' m (t+) = l{Y m < t} - /?, flj n (*-) = l{Y in < t} - (3, while 
p' in (t±) = F{t\xi n ) — /3. Condition (C.2) is satisfied, for instance, if Y in = 
f*(xin) + a(xin)Zi for some bounded function er(-) and i.i.d. random errors Zi 
with continuous and strictly positive density. Moreover, it follows from empiri- 
cal process theory that Condition (C.3) is satisfied with universal constants K\ 
and K 2 - 

In what follows let f n be any estimator of /». Our first consistency result 
applies to isotonic regression estimators as well as taut string estimators with 
constant tuning vector A via Theorem 2.4. It also applies to the taut string 
estimators with adaptively chosen tuning vectors A in case of (11). 

Theorem 7.1 Suppose that Conditions (C.l-3) hold and that /* is Holder con- 
tinuous on [A, B] with exponent 7 € (0, 1], i.e. 

L := sup r < 00. 

A<x<y<B (y - XP 

Let [A n , B n ] be a fixed or random subinterval of [A, B] with B n — A n > m*pn 
such that f n is isotonic on [A n , B n ]. Further suppose that either f minimizes the 
sum . Xj „ e [A n b„] Rin{f {xin)) over a ^ isotonic functions f : [A n , B n ] — > K or 
that fn = (fn(xin))'^_ 1 satisfies (11) for all n. Then for S n := pn 2,y and 
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sufficiently large C, 

sup (f(x)-f*(zj) 
xe[A n ,B n -5 n ] v ' I r .7/(27+1) 

sup \f*(x)-f(x)) 

xe[A n +S n ,B n ] K ' 

with asymptotic probability one. 

Our second consistency result concerns estimation of /* close to its local 
extrema. It is known that the least squares taut string estimators tend to un- 
derestimate /* near local maxima and overestimate /* near local minima. The 
generalized estimators discussed here have the same property, but this effect can 
be bounded: 

Theorem 7.2 Suppose that Conditions (C.l-3) hold. Let x* G (A, B) be a local 
extremum of f* such that 

lllllSU P | T _ r Ik < °° ( 12 ) 

for some k > 0. 

(a) If f n is a taut string estimator with tuning constants Aj n in (0, Con 1 / 2 ] 
for some constant c a , then 

max f n (x) > /(^)+O p (n- K /( 2K+2 )), 

min f n (x) < f(x*)+O p (n- K / (2K+ V). 

(b) Let f n be any estimator such that f n = (f n (xi n ))"_ 1 satisfies (11) for 
all n. Then for C sufficiently large, 

max /„(» > f(x*)-Cp?/( 2K+1 \ 

min / n (i) < /(£*) + C P ^ 2k+1 > 

x6[x»±n-V(2«+D] 

wjitt asymptotic probability one. 

To illustrate the latter results, suppose that /* is twice differentiablc with 
bounded second derivative on [A, B\. If x* G (A, £?) is a local extremum of /*, 
then (12) holds true with k = 2. Then the taut string estimator /„ with global 
tuning parameter A = A„ = 0(rt 1 ^ 2 ) underestimates (resp. overestimates) a 
local maximum (resp. minimum) by O p (n -1 / 3 ). In case of the adaptively chosen 

Am = A m (data), the latter rate improves to O p (p 2 / 5 )- 



8. Proofs 

Proof of Lemma 2.1. As mentioned earlier, the necessity of Condition (3) 
follows from (2) applied to ±5^^ . On the other hand, it will be shown below 
that an arbitrary vector S G R™ may be written as 
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6 = E <*W>6M 

l<j<k<n 

with real numbers a™*) satisfying the following two constraints: 

(1) For 1 < i < n it follows from Si > (or 5 t < 0) that > (or < 0) 
whenever i g {j, . . . , k}. 

(ii) For 1 < j < n, 

j n 

\s i+ i-Si\ = Ei« (fcJ) i+ E i a(ifc) i- 

fe=i fc=i+i 
With this particular representation of <5, one can easily show that 

DT(f,S) = E \a Uk) \DT(f,sign{a Uk) )5 Uk) ) > 0. 

l<j<k<n 

The coefficients a"*' may be constructed iteratively as follows: Let Jq := 
{i : Si > 0} and ao := min{<5i : i G J7o}- For any maximal index interval 
{j,...,k} C Jo set aft® := ao- Then define J7i := {« : <5i > a } and «i := 
min{(5i — ao : i € J7i}. For any maximal index interval {j, . . . , k} C J\ set 
:= ai. Then define Ji := {i '■ S^ > a{\ and proceed analogously, until we 
end up with an empty set J'g. Similarly, one may start with /Co := {i : Si < 0}, 
bo '■= max{<5i : i £ ICo}, and define aS^ ' for selected index intervals {j, . . . , fc}. 

□ 

Proof of Lemma 2.2. The necessity of Condition (4) follows from (2) if 
applied to ±8 {lk) . 

It remains to be shown that any vector / satisfying (4) for 1 < k < n satisfies 

(2) as well. Note that for any 8 G W\ 

n n 

J2^(m = -$>»-<W(/i) 

i=l i=l 

n — 1 Ti—1 

-^^(4 + i-4)J?!(/,) 

i—l k—i 

7i— 1 k 

= -E(^+i-^)E^(^)' 

k=l i=l 

since Y^=i R'i(fi) = 0- Consequently, 

n-l 

£>T(/,5) = £|&+i-&|JTk, 
fc=i 

where 

i/ fe := sign(4+i - 4) ^A fc sign(/ fc+1 - / fc ) - E #i(fi)) + Wfk+l = fk}- 
But condition (4) entails that all these quantities Hk are nonnegative. □ 
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Proof of Lemma 2.3. Let J = {j, . . . , k} a maximal index interval such that 
fj = --- = fk = max; fi. Then it follows from Lemma 2.2 that 

k 

E^C/i) = -(Ai-i + Ak). 

If j > 1 or fc < n, the right hand side is strictly negative, whence fj < z r . If 
j = 1 and k = n, then /i < z r , because X)"=i ^i( z ' - ) > 0. These considerations 
show that max, /j < z r , and analogous arguments reveal that min^ f t > zg. □ 
Our proof of Theorem 2.4 relies on a characterization of isotonic fits which 
is of independent interest. 

Theorem 8.1 Let 1 < a < b < n. A vector (fi) b i=a minimizes 5Zi =a -Ri(/i) 
under the constraint that f a <•••</& if and only if for arbitrary a < j < 
k < b, 

k 

E«i(/i-) < whenever f^ < fj = f k , (13) 
i=3 
k 

Y J K(h+) > whenever fj = f k < f k+l . (14) 

i=3 

Here f a —i '■= -co and fb+i '■= oo. 

Proof of Theorem 8.1. For notational convenience let a — 1 and b = n. 

Note that the functional / i— > 7|(/) := Y12=i ^i(fi) ^ s convex on an d that 
the set R| of vectors in R n with non-decreasing components is convex. Thus an 
isotonic vector / minimizes Tf over R» if, and only if, 

n 

£>r T (/,<5)=E^(/ i! sign(5 i ))5 i > 

i=l 

for any 5 £ M n such that f+tS is isotonic for some t > 0. The latter requirement 
is equivalent to 

Si < Sj whenever i < j and /; = fj. (15) 

Condition (15) is satisfied for S = -S {jk) if < and for J = <5° fe) if 
fk > fk+i- Thus the conditions stated in Theorem 8.1 are necessary. 

On the other hand, on can easily show that any 8 satisfying (15) may be 
written as a sum J2i<j<k<n a 

with real numbers a^ k ' satisfying (i) in 

the proof of Lemma 2.1 and 

(iii) aO'*0 < (rcsp. a^ jk) > 0) implies that fj_ t < fj (rcsp. f k < f k+1 ). 
One can deduce from this representation that 

£>!>(/, <5) = Yl |« W prT(/ } sign(« W )* tffc) ) 5 

l<j<k<n 

and each summand on the right hand side is non-negative by (13) and (14). □ 
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Proof of Theorem 2.4. We have to verify Conditions (13) and (14) with 
fi = fi for a < i < b. But it follows from (3) and our assumptions on A that the 
sum in (13) is not greater than Aj_isign(/j_i — fj)+\ksign(fk+i — fk) = 0, while 
the sum in (14) is not smaller than Aj_isign(/j_i — fj) + Afcsign(/fe + i — fk) = 0. 

□ 

Proof of Lemma 3.1. Suppose first that c > Mi m (v). For any z > c, 

>u >« 

whence M j m (u + v) < c. Moreover, 

R' jm (M tm (v)) = R' jk (M em (v)) + R' em (M em (v)) < u + v, 

V v ' S v ' 

<U = V 

so that Mj m {u + v) > Mt m {v). This proves Part (a). 

As for Part (b), suppose that c > M j m (u + v) . Then for c > z > Mj m (u + v), 

u + v < R' jm (z) = R' jk (z) + R' tm {z) < u + R' em (z), 
so that R' tm (z) > v. Hence Mg m (v) < Mj m (u + v). □ 

Proof of Theorem 3.2. Let J = {j, . . . , k} be a local maximum of /. A 
close inspection of Algorithm I reveals that 

fi = Mjki-^j-i ~ Afc) for i G J. 
On the other hand, it follows from our assumption on / that 

k 
i=3 



so that 



max fi > max fi 



Analogously one can show that min^g^; fi < min^x; fi f° r an y local minimum 
JC of /. □ 

Proof of Theorem 4.1. It follows from Assumption (7) that T e converges 
pointwisc to T as e [ 0. Since all functions T and T e are convex, it is well-known 
from convex analysis that the convergence is even uniform on arbitrary compact 
sets. Specifically consider the closed ball Br(0) around with radius R > 0. It 
follows from (A. 2) that for suitable R > 0, 

T(0) < min T(f). 

fedB R (o) 
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Hence for some e > 0, 

T B (0) < min T e (f) for < e < e . 
fedB R {o) 

These inequalities and convexity of the functions T and T e together entail that 
the sets T and !F e , < e < e a , are nonvoid and compact subsets of -Br(O). Now 
convergence of J- e to T follows easily from 

T e C {/ G B R (0) : T(f) < min T(g) + 2 max \T(g) - T £ (g)\\. □ 

Proof of Theorem 4.2. That g £ (/?, n—l + (3) n follows from Lemma 2.3 and 
the fact that R+(J3) < with strict inequality if Zi > 1, while R- (n - 1 + (3) > 
with strict inequality if < n. Thus / is well-defined, and it suffices to verify 
(3) for indices 1 < j < k < n. But the definitions of / and R[ entail that 

E^(/i+) = </*}-/?) 

i=j i=j 
k 

> xj(i{^-<r&i}-j0) 

k 

> E^&)- 

According to Lemma 2.1, applied to T in place of T, the right hand side is not 
smaller than 

Aj-i sign (gj_i - gj) + A fc sigii (g fc +i - 9k) 

= A^l - 2 • < 5,}) + A fc (l - 2 • l{g k+1 < g k }) 

> Aj_x(l - 2 • ll/^x < /,•}) + A fc (l - 2 • l{/ fc+1 < &}) 
= A 3 --isign(/j_i - fj) + A fc sign(/fc + i - / fc ). 

This proves the first part of (3). Similarly, 



E^(£-) = E(i{^</i}-/3) 

it 



i=3 i=] 
k 



k 

< E^(^) 



< A J _isign(g i _i - + A fc sign(<? fc+ i - g k ) 

< Aj_iilgn(/j_i - /j-) + Afciign(/ fc+ i - f k ). □ 
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Proof of Theorem 7.1. It follows from condition (C.3) that 

sup A n ^(a,b,t) 2 < r] M n [a, b] log n (16) 

A<a<b<B, tGK 

with probability at least 

1 - n 2 Ki exp(-/v 2 r/ log n) — > 1 ifr) >2/K 2 . 

Suppose that f n (x n ) > f*(x n ) + e„ for some x n € [A n , B n — S„] and e n = 
CpZ^ 2 ' 1 = CS2 with C to be specified later. Then for x n < x < x n + S n , 

fn{x)-U{x) > fn{Xn)-h{x n )-\U{x)-U{x n )\ > (C - L)%. 

If /„ minimizes the sum . A <x , <B Ri(f(xi n )) over all isotonic functions 

/ on [A n , B n ], we assume without loss of generality that f n (xi n ) < f(x n ) when- 
ever A n < Xi n < x n ■ For otherwise we could replace x n with the smallest design 
point Xi n in [^nj-Bn] such that f n (xi n ) = f n (x n ). Then Theorem 8.1 entails 
that 

> J2 Rin(Mx m )-) 

i : x n <Xin <x n -\-8 n 

> E Rin{(f*(Xin) + {C-L)5Z)+) 

i : x n <Xi n <x n -\-5 n 

i : x n <.Xi n <x n -\-6 n Xl'X 

] log n) 

1 /2 

> H((C - L)8Z)M n [Xn,X n + S n ] - (T] M n [Xn,Xn + S n ] log Tl) 

in case of C > L and (16). If /„ is an arbitrary estimator satisfying (11), then 
only 

1 /2 

(c Q M n [x n , X n + S n ) log Tl) + C a log Tl 

> E RLUn(x m )-) 

i : x n <.Xi n <x n -\-8 n 

1 /2 

> H((C - L)SZ)M n [x n , X„ + Sn] - (r) Mn[Xn,X n + S n ] log Tl) . 

But 5 n > rn*p n for sufficiently large n, and then the preceding displayed in- 
equalities entail that 

H((C - L)Sl) < ((Wm ) 1/2 + (co/m«0 1/2 )(p»/*n) 1/a + (c /m )( P n/Sn) 
= ((W™o) 1/2 + ( C „/m„) 1/2 + o(l))SZ. 

Hence C < L + {{ no /m ) x ' 2 + (co/rrio) 1 / 2 + o(l))/h . 

The assertion about the maximum of /* — /„ on the interval [A n + S n , B n ] is 
proved analogously. □ 
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Proof of Theorem 7.2. We only prove the assertion about the minimum 
of /„ over a neighborhood of x*, because the other part follows analogously. 
Suppose that /„ > /* +e n on [x* ±6 n ], where both 6 n > and e n > are fixed 
numbers tending to zero, while 5 n > m*p n . 

In case of a taut string estimator with parameters Aj„ G (0, c n -1 / 2 ] , it follows 
from (3) and (16) that 

2 Co n 1 / 2 > &i(fn(Xin)-) 

i : \xi n — x* |<<5„ 

i : \xi n — x*\<5 n 

> Af„[x* ± 5 n ]H{e n ) - O p (M n [x* ± ^J 1 / 2 ) 

> (2/i + o(l))TO nJ„£„ - O p (n 1/2 ), 
where /i D := liminf^o H (t)/t. Hence 

On the other hand, 

sup - < O(^). (17) 

a;G[a;»±5„] 

Hence setting #„ := 7i _1 /( 2K + 2 ) yields the assertion. 
In case of any estimator satisfying (11), 

1 /2 

(c M n [x* ± £„] log n) + c log n 

> £ i?K/„(x m )-) 

> M n [x*±S n ]H(e n ) - (r] M n [x„±S n } 1/2 \ogn), 

i.e. £ n equals 

o((log(n)/M n [x* ± 5„]) 1/2 + log(n)/M„[a:* ± 5 n fj = 0((p„/<5„) 1/2 + p n /S n ). 

Comparing this with (17) shows that one should take S„ = pli 2k , and this 
yields the assertion about the minimum of /„. □ 

Software 

The generalized taut string algorithm has been implemented in the ftnonpar 
package for the statistics software R (Ihaka and Gentleman, 1996). This add-on 
package can be downloaded and installed by the standard install . packages () 
command of R. All examples considered in this paper are available via the general 
genpmreg function using the method parameter to choose from the usual taut 
string method, the quantile version and the versions for binomial and Poisson 
noise. 
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